113 research outputs found

    Convex optimization over intersection of simple sets: improved convergence rate guarantees via an exact penalty approach

    Full text link
    We consider the problem of minimizing a convex function over the intersection of finitely many simple sets which are easy to project onto. This is an important problem arising in various domains such as machine learning. The main difficulty lies in finding the projection of a point in the intersection of many sets. Existing approaches yield an infeasible point with an iteration-complexity of O(1/ε2)O(1/\varepsilon^2) for nonsmooth problems with no guarantees on the in-feasibility. By reformulating the problem through exact penalty functions, we derive first-order algorithms which not only guarantees that the distance to the intersection is small but also improve the complexity to O(1/ε)O(1/\varepsilon) and O(1/ε)O(1/\sqrt{\varepsilon}) for smooth functions. For composite and smooth problems, this is achieved through a saddle-point reformulation where the proximal operators required by the primal-dual algorithms can be computed in closed form. We illustrate the benefits of our approach on a graph transduction problem and on graph matching

    How Many Pairwise Preferences Do We Need to Rank A Graph Consistently?

    Full text link
    We consider the problem of optimal recovery of true ranking of nn items from a randomly chosen subset of their pairwise preferences. It is well known that without any further assumption, one requires a sample size of Ω(n2)\Omega(n^2) for the purpose. We analyze the problem with an additional structure of relational graph G([n],E)G([n],E) over the nn items added with an assumption of \emph{locality}: Neighboring items are similar in their rankings. Noting the preferential nature of the data, we choose to embed not the graph, but, its \emph{strong product} to capture the pairwise node relationships. Furthermore, unlike existing literature that uses Laplacian embedding for graph based learning problems, we use a richer class of graph embeddings---\emph{orthonormal representations}---that includes (normalized) Laplacian as its special case. Our proposed algorithm, {\it Pref-Rank}, predicts the underlying ranking using an SVM based approach over the chosen embedding of the product graph, and is the first to provide \emph{statistical consistency} on two ranking losses: \emph{Kendall's tau} and \emph{Spearman's footrule}, with a required sample complexity of O(n2χ(Gˉ))23O(n^2 \chi(\bar{G}))^{\frac{2}{3}} pairs, χ(Gˉ)\chi(\bar{G}) being the \emph{chromatic number} of the complement graph Gˉ\bar{G}. Clearly, our sample complexity is smaller for dense graphs, with χ(Gˉ)\chi(\bar G) characterizing the degree of node connectivity, which is also intuitive due to the locality assumption e.g. O(n43)O(n^\frac{4}{3}) for union of kk-cliques, or O(n53)O(n^\frac{5}{3}) for random and power law graphs etc.---a quantity much smaller than the fundamental limit of Ω(n2)\Omega(n^2) for large nn. This, for the first time, relates ranking complexity to structural properties of the graph. We also report experimental evaluations on different synthetic and real datasets, where our algorithm is shown to outperform the state-of-the-art methods.Comment: In Thirty-Third AAAI Conference on Artificial Intelligence, 201

    Second order cone programming approaches for handling missing and uncertain data

    No full text
    We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation

    Random Separating Hyperplane Theorem and Learning Polytopes

    Full text link
    The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. \rsh asserts that if the distance between aa and a polytope KK with kk vertices and unit diameter in ℜd\Re^d is at least δ\delta, where δ\delta is a fixed constant in (0,1)(0,1), then a randomly chosen hyperplane separates aa and KK with probability at least 1/poly(k)1/poly(k) and margin at least Ω(δ/d)\Omega \left(\delta/\sqrt{d} \right). An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope KK within Hausdorff distance δ\delta, given an optimization oracle for KK. Using RSH, we show that with polynomially many random queries to the optimization oracle, KK can be approximated within error O(δ)O(\delta). To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of KK are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance O(δ)O(\delta) of KK, with the property that the list contains a point close to each vertex of KK. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption
    • …
    corecore